Search CORE

104 research outputs found

Semantic Structure based Query Graph Prediction for Question Answering over Knowledge Graph

Author: Li Mingchen
Publication venue: ScholarWorks @ Georgia State University
Publication date: 14/12/2022
Field of study

Building query graphs from questions is an important step in complex question answering over knowledge graph (Complex KGQA). In general, a question can be correctly answered if its query graph is built correctly and the right answer is then retrieved by issuing the query graph against the KG. Therefore, this paper focuses on query graph generation from natural language questions. Existing approaches for query graph generation ignore the semantic structure of a question, resulting in a large number of noisy query graph candidates that undermine prediction accuracies. In this paper, we define six semantic structures from common questions in KGQA and develop a novel Structure-BERT to predict the semantic structure of a question, and then rank the remaining candidates with a BERT-based ranking model. Extensive experiments on two popular benchmarks MetaQA and WebQuestionsSP demonstrate the effectiveness of our method as compared to state-of-the-arts

ScholarWorks @ Georgia State University

A Condensed Transition Graph Framework for Zero-shot Link Prediction with Large Language Models

Author: Li Mingchen
Ling Chen
Zhang Rui
Zhao Liang
Publication venue
Publication date: 16/02/2024
Field of study

Zero-shot link prediction (ZSLP) on knowledge graphs aims at automatically identifying relations between given entities. Existing methods primarily employ auxiliary information to predict tail entity given head entity and its relation, yet face challenges due to the occasional unavailability of such detailed information and the inherent simplicity of predicting tail entities based on semantic similarities. Even though Large Language Models (LLMs) offer a promising solution to predict unobserved relations between the head and tail entity in a zero-shot manner, their performance is still restricted due to the inability to leverage all the (exponentially many) paths' information between two entities, which are critical in collectively indicating their relation types. To address this, in this work, we introduce a Condensed Transition Graph Framework for Zero-Shot Link Prediction (CTLP), which encodes all the paths' information in linear time complexity to predict unseen relations between entities, attaining both efficiency and information preservation. Specifically, we design a condensed transition graph encoder with theoretical guarantees on its coverage, expressiveness, and efficiency. It is learned by a transition graph contrastive learning strategy. Subsequently, we design a soft instruction tuning to learn and map the all-path embedding to the input of LLMs. Experimental results show that our proposed CTLP method achieves state-of-the-art performance on three standard ZSLP dataset

arXiv.org e-Print Archive

TemPL: A Novel Deep Learning Model for Zero-Shot Prediction of Protein Stability and Activity Based on Temperature-Guided Language Modeling

Author: Hong Liang
Li Mingchen
Tan Pan
Zhang Liang
Publication venue
Publication date: 07/04/2023
Field of study

We introduce TemPL, a novel deep learning approach for zero-shot prediction of protein stability and activity, harnessing temperature-guided language modeling. By assembling an extensive dataset of ten million sequence-host bacterial strain optimal growth temperatures (OGTs) and {\Delta}Tm data for point mutations under consistent experimental conditions, we effectively compared TemPL with state-of-the-art models. Notably, TemPL demonstrated superior performance in predicting protein stability. An ablation study was conducted to elucidate the influence of OGT prediction and language modeling modules on TemPL's performance, revealing the importance of integrating both components. Consequently, TemPL offers considerable promise for protein engineering applications, facilitating the design of mutation sequences with enhanced stability and activit

arXiv.org e-Print Archive

PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction

Author: Chen M.
Li Mingchen
Zhang Rui
Zhou Huixue
Publication venue
Publication date: 27/10/2023
Field of study

The automatic extraction of biomedical entities and their interaction from unstructured data remains a challenging task due to the limited availability of expert-labeled standard datasets. In this paper, we introduce PETAI-LOR, a retrieval-based language framework that is augmented by tailored chunk scorer. Unlike previous retrieval-augmented language models (LM) that retrieve relevant documents by calculating the similarity between the input sentence and the candidate document set, PETAILOR segments the sentence into chunks and retrieves the relevant chunk from our pre-computed chunk-based relational key-value memory. Moreover, in order to comprehend the specific requirements of the LM, PETAI-LOR adapt the tailored chunk scorer to the LM. We also introduce GM-CIHT, an expert annotated biomedical triple extraction dataset with more relation types. This dataset is centered on the non-drug treatment and general biomedical domain. Additionally, we investigate the efficacy of triple extraction models trained on general domains when applied to the biomedical domain. Our experiments reveal that PETAI-LOR achieves state-of-the-art performance on GM-CIHTComment: this is the first preprint versio

arXiv.org e-Print Archive